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tH- ' Abstract The maximum likelihood estimate of the correlation parameter of a Gaus- 
CS| . sian process with and without of a nugget term is studied in the case of the analysis 
of deterministic models. 

1 Introduction 

-(— > 

^1 ' The Gaussian process method is an elegant way to analyze the results of experiments 

in many areas of science including machine learning (fRasmussen and Wi lliams 20061 1. 
spatial statistics (IMatheron 1973l|Ripley 1981|ICressie 1993.,Muller 2007]l, and the 



Bayesian analysis of computer experiments ( Sacks, Welch, Mitchell, and Wynn 1989 



Kennedy and O'Hagan 2001 [Santner, Williams, and Notz 2003 1. Each area has its 



> 

in 

00 . 

^f\ , own specific ways of employing and interpreting the Gaussian processes. The pur- 

'!;;j- ■ pose of this paper is not to give a full overview, that can be found in the above 

references, but to discuss some issues on the nugget term for the analysis of com- 
f"^ ■ puter experiments. 

^^ , The conception of the nugget term was first introduced in geostatistics by 

IMatheron (1962)| Roughly speaking, the variogram and covariance often show a 
discontinuity at the origin, termed the nugget effect. The nugget effect is considered 
.J as a random noise and may represent a measurement error or short scale variabili- 

rS I ties. The nugget term is a well explored object in spatial statistics (IPitard 1993l l. 

j^ ■ Another area of the application of Gaussian processes is the Bayesian approach 

developed for the analysis of computer experiments. In this approach, a so-called 
emulator is introduced for making probabilistic judgments on the true output of the 
given computer model, which is called a simulator. A Gaussian process is used for 
a full probabilistic specification of the emulator. Thus, the emulator is utilized to 



measure uncertainty of different kinds, see (Kennedy and O'Hagan 2001 1. 
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Formally, there is no nugget term in the Gaussian process method for the analysis 
of deterministic models, but the nugget term can be introduced artificially, for exam- 
ple, for the regularization of the inversion of a covariance matrix, see dNeal 1997] l 



for details. Gramacy and Lee (2009) reported on the usefulness of the nugget term 
in their research of supercomputer experiments. 

The presence of the nugget term in the Gaussian process method is natural for 
the analysis of stochastic and simulation models. The nugget effect may represent 
a measurement error or an effect of random values used inside computer models 



(Kleijnen 2008 Kleijnen and van Beers 2005 1 



The influence of the nugget term for optimal designs of experiments for a number 
of cases have been studied in jZhu and Stein 20051 Stehlfk, Rodriguez-Diaz, Miiller, and Lopez-Fidalgo 2008 1 



The present paper focuses on the Gaussian process method applied for the anal- 
ysis of deterministic models. It is shown that the nugget term has a great impact on 
the Ukelihood and the estimate of correlation parameter 



2 The likelihood for a Gaussian process without the nugget term 

In this section, it is shown that the likelihood of a Gaussian process has an unex- 
pected behaviour in the analysis of non-stochastic models. More precisely, for a 
deterministic model of observations, the maximum likelihood estimate of the cor- 
relation parameter may tend to the infinity as the number of points increases. It 
means that a deterministic model is approximated by a Gaussian process with the 
correlation function r{x) w 1 for any x. 

Indeed, let y,- ~ ri{xi) be the output of the model ri{x) at the point x,- e [0, 1], 
i= !,...,«. Note that for a deterministic model, the replication of an observation at 
some point gives the same output. Without loss of generality, let xi < . . . < x„. The 
likelihood for a Gaussian process with constant mean )3, variance a^ and correlation 
function r{x,x) = e^l-^^-^i/v have the form 



|/?|-l/2 

piy\P,<y,¥) 



J—i,._HR\'r p-t 



{y-Hli)'R-'{y-Hli) 



(27rc72)"/2 



where y = (yi, . . . jjn)"^ is the vector of output values, R = (r(x,-,x,|i//'))"; j is the 
correlation matrix, H — {h{xi),. . . ,h{x„)), and h{x) = 1. 

The maximum likelihood (ML) estimates of j3 and G have the following explicit 
forms 

$ML = {H^R-^H)-^HR-^y 

and 

O-ML = -{y-HpMLfR-\y-HpML)- 
n 
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The ML estimate of y/ can be found only numerically in the following way 

V/ML ==arg max p{y\PML,^ML,^f)- 
After substituting and simplifying, we obtain that the estimate \jfML maximizes 



L{y/) = In 



\R\ 



-1/2 



■In 



{y-H^MLYR-'iy-H^ML) 



For the exponential correlation function, the inverse of matrix R admits the explicit 
representation R^^ = V^V where the matrix V is defined by 



V ^ 



1 

Al2 





1 



i^ 1 
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Hi = e ^^' ■^■'-i)/v For« equidistant points jf,- = (/— l)/(7i — 1), /= !,...,«, straight- 
forward calculation shows that 



yR-'y 



y\+yl 

1-A2 



n— 1 1 
1=2 ^ 



-A^ 



-A2 



n-l 

-2Xl3'/}',+i 
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A 



1-A2 



where A = e ("-i)v , and 



\R 



-1/2 _ 



1 



(l-A2)(«-l)/2' 

For the model T] (x) = x — 1 /2, we obtain that )3ml = and 

1 1 n^-5n + 6 \+X^ n^-2n-3 



yR-'y^ 



2 1-A2 12(n-l) 1-A2 6(n-l) 1-A2' 



The estimate i//ml can be found explicitly in Maple and is not presented since it is a 
very large expression. Applying the power series expansion, we have 



e {"-iWml — 1 



2 20 ^ / 1 \ , . « 7 7 

— - T-^ + O ^ and i/ML = o - 7 - 73- 
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18« 54«2 



0(4 



The dependence of Yml on « is given in Figure [T] for the model 77 (x) = x — 1/2 at 
the left part and for the model 77 (x) = sin(2;rx) at the right part. We observe that the 
estimate i/Zml increases almost linearly as « increases for both models. 

The maximum likelihood estimate of yf for the Gaussian correlation function 
r(x,x) = e^^'''^^' ''''is given in Figure |2] For the model 77(x) = x— 1/2 we have 
that Yml = °° for any «. Note that for the model Tj(x) ~ sin(2;rx), the condition 
numberof the correlation matrix /?(v>ml) is of order 10^, lO'"^, 1022, 10^'', and 10^^ 
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Fig. 1 The maximum likelihood estimate of ij/ for the Gaussian process with the exponential cor- 
relation function and n equidistant points on the interval [0, 1] for the model T)(x) =x— 1/2 (left 
part) and for the model rj (x) = sin(2;rx) (right part) for n = 6, . . . , 20. 



for n = 8, 11, 14, 17, and 20, respectively. These calculations were done in Maple 
with 45 digits precision. However, the computer representation of floating numbers 
typically has only 17 digits. Thus, it is impossible to find the maximum likelihood 
estimate for large « using the ordinary floating representation in a computer In par- 



ticular, Ababou, Bagtzoglou, and Wood (1994) have shown that the condition num- 
ber grows linearly for the exponential correlation function and grows exponentially 
for the Gaussian correlation function. 




Fig. 2 At left: The maximum likelihood estimate of \j/ for the Gaussian process with the Gaussian 
correlation function and n equidistant points on the interval [0, 1] for the model r](x) = sin(2;rx) 
for n = 6, . . . , 20. At right: The likelihood function of i// for n = 7, 14, 20. 



In more general situations for other correlation functions and other models, the 
dependence of the maximum likelihood estimate and the restricted maximum likeli- 
hood estimate of y/ on n remains typically the same and can be verified numerically 



( [Pepelyshev 2009[ l. 

Thus, roughly speaking, the estimate of parameters of a Gaussian process is as- 
sociated with the given data set and is not associated with the deterministic model. 
This estimation is not simple and is not well-defined. It is easy to observe that if one 
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divides an input space into several regions, one may get quite different estimates 
of parameters for different regions. However, if one is looking for one Gaussian 
process over the full space, one has difficulty in finding the single estimate. 



3 The likelihood for a Gaussian process with the nugget term 
3.1 MLEfor a Gaussian process 

In this section, the likelihood with the presence of the nugget term is investigated. 
For this case, the correlation matrix R in formulae from Section 2 should be replaced 
to the correlation matrix 

Rv^{{l-v)r{xi~Xj) + v5,,j)^. 

where v is the nugget term. 



0.1 
0.08 
0.06 
0.04 



v=0.02 

o O o o 
v=0.05 



o o 




0.1 0.2 0.3 V 0.4 



Fig. 3 At left: The maximum likelihood estimate of ^l for the Gaussian process with the Gaussian 
correlation function and the nugget term v = 0.02, 0.05 for measurements of the model 7] (x) = 
sin(2;rx) at n equidistant points on the interval [0, 1] , n = 6, . . . , 20. At right: The likelihood function 
of V' for the nugget term v = 0.02 and for n = 7, 14, 20. 



The Ukelihood function and the maximum likelihood estimate for fixed values 
of the nugget term are presented in Figure [3] One can observe that the nugget term 
essentially changes the maximum likelihood estimate of i/a (and also a). The esti- 
mate i/ml does not increase to infinity as n increases, since the Gaussian process is 
fitted to a band around the deterministic function. It should also be noted that the 
condition number of the correlation matrix Ra is of order 10^ and is increasing very 
slowly as n is increasing. Moreover, the estimate "^ml is smaller with the presence 
of the nugget term that also reduces the condition number of the correlation ma- 
trix. Ababou, Bagtzoglou, and Wood (1994) have shown that the condition number 



of the correlation matrix for the Gaussian process models increases to a finite limit 
with the presence of the nugget term. 
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Note one undesired effect of the nugget term. The UkeHhood may have the second 
mode for large values of the correlation parameter, see Figure |4] The second mode 
strongly depends on a value of the nugget term and can be considered as a false 
mode. For some data, the likelihood function at the second mode may have a larger 
value than at the first mode. 




Fig. 4 The likelihood function of y/ for the Gaussian process with the Gaussian correlation func- 
tion and the nugget term V = 0,0.01,0.001,0.0001 for measurements of the model 7j(.v) = sin(2;rx) 
at 7 equidistant points on the interval [0, 1]. 



Note that in the presence of the nugget term, the meta-model 

my{x)=Hl5+t^{x)R-\y-Hp) 

where t{x) ~ {r{x,xi),... ,r{x,Xn))^ does not possess the interpolation property. 
Nevertheless, the deviations e,- = y, — my{xj) are very small. One may construct a 
meta-model, that interpolates the dataset {{xi, £,)}"^[, by a method given in Cressie 
(1993, Sect. 5.9). It is not necessary for the deviations £, to use the Kriging approach 
without the nugget term. One may use the inverse distance weighted interpolation 
dCressie f993i p. 37 1 , Lu and Wong 2008 1 and define the meta-model in the follow- 
ing form 



ic/iix-x/i' 2 



m{x) ^mv{x) + '—^ 



^/|I2 



L \\x~x, 

i=l 



'112 



3.2 MLE for stationary processes 



Let us perform a small simulation study. Assume that the results of experiments 
satisfy 
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where xi,...,x„ are points of measurements, e'''(x) denotes a stationary Gaus- 
sian process with correlation function r{x) — e^"''^ and e'^'(x) is white noise. Let 
E£(-''(x) = 0, D£(-')(x) = 1, processes e(^)(x) and e(^)(x) be independent. The val- 
ues j3 + (7^e'''(x,) may be conceived as true values of a physical process. The val- 
ues T^e'^'(x,) may be interpreted as a measurement error or a rough rounding of 
measured values. Let us compute the maximum likelihood estimators for 1000 real- 
izations obtained for « = 8, X; = (/— l)/7, / = 1,...,8, j3 =2, v/= 1.5, ff = 1, T = 
or T = 0.01. Results of the maximum likelihood estimation for different values of 
the nugget term are presented in Table [T] 



Table 1 The mean of maximum likelihood estimators of parameters using different values of the 
nugget term. Standard deviations are given in brackets. 





T = 


T = 0.01 1 


V 





0.01 


0.02 





0.01 


0.02 




2.03(0.68) 
0.83(0.40) 
1.44(0.37) 


2.01(0.85) 
0.29(0.17) 
0.54(0.25) 


2.02(0.86) 
0.27(0.16) 
0.47(0.20) 


2.02(0.92) 
0.33(0.23) 
0.14(0.06) 


2.04(0.85) 
0.30(0.17) 
0.58(0.29) 


2.04(0.86) 
0.28(0.16) 
0.49(0.23) 



One can observe that the maximum likelihood estimators with a nonzero nugget 
term does not depend on small perturbations {r^e'^' (^i)}/ of the data {/3 + C7^e'^'' (.x:,)},. 
In contrast, for v = 0, the maximum likelihood estimators of a and \// are signifi- 
cantly changed due to adding small perturbations. In all cases, the accuracy of Pml 
is approximately the same. Thus, as can be seen, the nugget term yields a regular- 
ization effect on the maximum likelihood estimators. 



4 Conclusions 

In the analysis of deterministic models the presence of a nugget term has a sig- 
nificant impact on the likelihood of a Gaussian process. The maximum likelihood 
estimate of the correlation parameter with a nonzero nugget term is more reliable 
and the condition number of the correlation matrix is moderate. Even if a determin- 
istic model does not have any internal computational errors or other perturbations, 
the artificial introduction of the nugget term can be recommended. 
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